atomic event
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Tang, Changli, Li, Yixuan, Yang, Yudong, Zhuang, Jimin, Sun, Guangzhi, Li, Wei, Ma, Zejun, Zhang, Chao
We present video-SALMONN 2, a family of audio-visual large language models that set new state-of-the-art (SOTA) results in video description and question answering (QA). Our core contribution is multi-round direct preference optimisation (MrDPO), paired with a caption-quality objective that jointly rewards completeness and factual accuracy. Unlike standard DPO with a fixed reference policy, MrDPO periodically refreshes the reference by bootstrapping from a newly re-initialised lightweight adapter trained on the latest preferences, avoiding reference staleness and enabling continual improvement. This strategy produces captions that are consistently more detailed and accurate than those from proprietary systems such as GPT-4o and Gemini-1.5 Pro. We further distil these gains by using our model to generate a high-quality video-caption corpus for supervised fine-tuning of new models, transferring benefits beyond captioning to strong performance on complex video-QA tasks. Across widely used audio-visual and visual-only understanding benchmarks (including Video-MME, WorldSense, AVUT, Video-Holmes, DailyOmni, MLVU, and LVBench), our 3B and 7B models achieve SOTA results at comparable scales, while the 72B model surpasses all other open-source systems. Our source code, models, and data are released at \href{https://github.com/bytedance/video-SALMONN-2}{https://github.com/bytedance/video-SALMONN-2}.
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- (6 more...)
NARCE: A Mamba-Based Neural Algorithmic Reasoner Framework for Online Complex Event Detection
Han, Liying, Dong, Gaofeng, Ouyang, Xiaomin, Kaplan, Lance, Cerutti, Federico, Srivastava, Mani
Current machine learning models excel in short-span perception tasks but struggle to derive high-level insights from long-term observation, a capability central to understanding complex events (CEs). CEs, defined as sequences of short-term atomic events (AEs) governed by spatiotemporal rules, are challenging to detect online due to the need to extract meaningful patterns from long and noisy sensor data while ignoring irrelevant events. We hypothesize that state-based methods are well-suited for CE detection, as they capture event progression through state transitions without requiring long-term memory. Baseline experiments validate this, demonstrating that the state-space model Mamba outperforms existing architectures. However, Mamba's reliance on extensive labeled data, which are difficult to obtain, motivates our second hypothesis: decoupling CE rule learning from noisy sensor data can reduce data requirements. To address this, we propose NARCE, a framework that combines Neural Algorithmic Reasoning (NAR) to split the task into two components: (i) learning CE rules independently of sensor data using synthetic concept traces generated by LLMs and (ii) mapping sensor inputs to these rules via an adapter. Our results show that NARCE outperforms baselines in accuracy, generalization to unseen and longer sensor data, and data efficiency, significantly reducing annotation costs while advancing robust CE detection.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy (0.04)
- Asia > China > Hong Kong (0.04)
- Health & Medicine (0.92)
- Government > Regional Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting
Chang, He, Ye, Chenchen, Tao, Zhulin, Wu, Jie, Yang, Zhengmao, Ma, Yunshan, Huang, Xianglin, Chua, Tat-Seng
Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluation of LLM-based methods for temporal event forecasting. Due to the lack of a high-quality dataset that involves both graph and textual data, we first construct a benchmark dataset, named MidEast-TE-mini. Based on this dataset, we design a series of baseline methods, characterized by various input formats and retrieval augmented generation(RAG) modules. From extensive experiments, we find that directly integrating raw texts into the input of LLMs does not enhance zero-shot extrapolation performance. In contrast, incorporating raw texts in specific complex events and fine-tuning LLMs significantly improves performance. Moreover, enhanced with retrieval modules, LLM can effectively capture temporal relational patterns hidden in historical events. Meanwhile, issues such as popularity bias and the long-tail problem still persist in LLMs, particularly in the RAG-based method. These findings not only deepen our understanding of LLM-based event forecasting methods but also highlight several promising research directions.We consider that this comprehensive evaluation, along with the identified research opportunities, will significantly contribute to future research on temporal event forecasting through LLMs.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- Asia > Middle East > Israel (0.05)
- (8 more...)
An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection
Han, Liying, Srivastava, Mani B.
Robots and autonomous systems require an understanding of complex events (CEs) from sensor data to interact with their environments and humans effectively. Traditional end-to-end neural architectures, despite processing sensor data efficiently, struggle with long-duration events due to limited context sizes and reasoning capabilities. Recent advances in neuro-symbolic methods, which integrate neural and symbolic models leveraging human knowledge, promise improved performance with less data. This study addresses the gap in understanding these approaches' effectiveness in complex event detection (CED), especially in temporal reasoning. We investigate neural and neuro-symbolic architectures' performance in a multimodal CED task, analyzing IMU and acoustic data streams to recognize CE patterns. Our methodology includes (i) end-to-end neural architectures for direct CE detection from sensor embeddings, (ii) two-stage concept-based neural models mapping sensor embeddings to atomic events (AEs) before CE detection, and (iii) a neuro-symbolic approach using a symbolic finite-state machine for CE detection from AEs. Empirically, the neuro-symbolic architecture significantly surpasses purely neural models, demonstrating superior performance in CE recognition, even with extensive training data and ample temporal context for neural approaches.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- Information Technology (0.68)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Structured, Complex and Time-complete Temporal Event Forecasting
Ma, Yunshan, Ye, Chenchen, Wu, Zijian, Wang, Xiang, Cao, Yixin, Pang, Liang, Chua, Tat-Seng
Temporal event forecasting aims to predict what will happen next given the observed events in history. Previous formulations of temporal event are unstructured, atomic, or lacking full temporal information, thus largely restricting the representation quality and forecasting ability of temporal events. To address these limitations, we introduce a novel formulation for Structured, Complex, and Time-complete Temporal Event (SCTc-TE). Based on this new formulation, we develop a simple and fully automated pipeline for constructing such SCTc-TEs from a large amount of news articles. Furthermore, we propose a novel model that leverages both Local and Global contexts for SCTc-TE forecasting, named LoGo. To evaluate our model, we construct two large-scale datasets named MidEast-TE and GDELT-TE. Extensive evaluations demonstrate the advantages of our datasets in multiple aspects, while experimental results justify the effectiveness of our forecasting model LoGo. We release the code and dataset via https://github.com/yecchen/GDELT-ComplexEvent.
- Asia > Middle East > Lebanon (0.15)
- Europe > Ukraine (0.14)
- Africa > Middle East > Egypt (0.14)
- (10 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
A Diffusion Model for Event Skeleton Generation
Zhu, Fangqi, Zhang, Lin, Gao, Jun, Qin, Bing, Xu, Ruifeng, Yang, Haiqin
Event skeleton generation, aiming to induce an event schema skeleton graph with abstracted event nodes and their temporal relations from a set of event instance graphs, is a critical step in the temporal complex event schema induction task. Existing methods effectively address this task from a graph generation perspective but suffer from noise-sensitive and error accumulation, e.g., the inability to correct errors while generating schema. We, therefore, propose a novel Diffusion Event Graph Model~(DEGM) to address these issues. Our DEGM is the first workable diffusion model for event skeleton generation, where the embedding and rounding techniques with a custom edge-based loss are introduced to transform a discrete event graph into learnable latent representation. Furthermore, we propose a denoising training process to maintain the model's robustness. Consequently, DEGM derives the final schema, where error correction is guaranteed by iteratively refining the latent representation during the schema generation process. Experimental results on three IED bombing datasets demonstrate that our DEGM achieves better results than other state-of-the-art baselines. Our code and data are available at https://github.com/zhufq00/EventSkeletonGeneration.
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Afghanistan > Kabul Province > Kabul (0.04)
- (15 more...)
Learning Temporal Rules from Noisy Timeseries Data
Samel, Karan, Zhao, Zelin, Chen, Binghong, Li, Shuang, Subramanian, Dharmashankar, Essa, Irfan, Song, Le
Events across a timeline are a common data representation, seen in different temporal modalities. Individual atomic events can occur in a certain temporal ordering to compose higher level composite events. Examples of a composite event are a patient's medical symptom or a baseball player hitting a home run, caused distinct temporal orderings of patient vitals and player movements respectively. Such salient composite events are provided as labels in temporal datasets and most works optimize models to predict these composite event labels directly. We focus on uncovering the underlying atomic events and their relations that lead to the composite events within a noisy temporal data setting. We propose Neural Temporal Logic Programming (Neural TLP) which first learns implicit temporal relations between atomic events and then lifts logic rules for composite events, given only the composite events labels for supervision. This is done through efficiently searching through the combinatorial space of all temporal logic rules in an end-to-end differentiable manner. We evaluate our method on video and healthcare datasets where it outperforms the baseline methods for rule discovery.
- Health & Medicine (1.00)
- Leisure & Entertainment > Sports > Baseball (0.68)
The Synergy of Complex Event Processing and Tiny Machine Learning in Industrial IoT
Ren, Haoyu, Anicic, Darko, Runkler, Thomas
Focusing on comprehensive networking, big data, and artificial intelligence, the Industrial Internet-of-Things (IIoT) facilitates efficiency and robustness in factory operations. Various sensors and field devices play a central role, as they generate a vast amount of real-time data that can provide insights into manufacturing. The synergy of complex event processing (CEP) and machine learning (ML) has been developed actively in the last years in IIoT to identify patterns in heterogeneous data streams and fuse raw data into tangible facts. In a traditional compute-centric paradigm, the raw field data are continuously sent to the cloud and processed centrally. As IIoT devices become increasingly pervasive and ubiquitous, concerns are raised since transmitting such amount of data is energy-intensive, vulnerable to be intercepted, and subjected to high latency. The data-centric paradigm can essentially solve these problems by empowering IIoT to perform decentralized on-device ML and CEP, keeping data primarily on edge devices and minimizing communications. However, this is no mean feat because most IIoT edge devices are designed to be computationally constrained with low power consumption. This paper proposes a framework that exploits ML and CEP's synergy at the edge in distributed sensor networks. By leveraging tiny ML and micro CEP, we shift the computation from the cloud to the power-constrained IIoT devices and allow users to adapt the on-device ML model and the CEP reasoning logic flexibly on the fly without requiring to reupload the whole program. Lastly, we evaluate the proposed solution and show its effectiveness and feasibility using an industrial use case of machine safety monitoring.
- Europe > Italy (0.30)
- Europe > Germany (0.14)
- Oceania > New Zealand (0.14)
- (6 more...)
- Information Technology > Smart Houses & Appliances (0.35)
- Energy > Oil & Gas > Upstream (0.34)
Extending and Automating Basic Probability Theory with Propositional Computability Logic
Classical probability theory[2] is formulated using sets. Unfortuna tely, the language of sets lacks expressiveness and is, in a sense, a low-level'assembly language' of the probability theory. In this paper, we develop a'high -level approach' to classical probability theory with propositional compu tability logic[1] (CoL). Unlike other formalisms such as sets, logic and linear log ic, computability logic is built on the notion of events/games, which is cent ral to probability theory. Therefore, CoL is a perfect place to begin th e study of automating probability theory. To be specific, CoL is well-suited to describing complex (sequential/parallel) experiments and events, and more expressive than set operation s. In contrast, classical probability theory - based on,, etc - is designed to represent mainly the simple/additive events - the events that occur under a single experiment. Naturally, we need to talk about composite/multiplicative events - events that occur under two different experiments. Developing probability along this line requires a new, powerful language.
A Bayesian Variant of Shafer's Commonalities For Modelling Unforeseen Events
Shafer's theory of belief and the Bayesian theory of probability are two alternative and mutually inconsistent approaches toward modelling uncertainty in artificial intelligence. To help reduce the conflict between these two approaches, this paper reexamines expected utility theory-from which Bayesian probability theory is derived. Expected utility theory requires the decision maker to assign a utility to each decision conditioned on every possible event that might occur. But frequently the decision maker cannot foresee all the events that might occur, i.e., one of the possible events is the occurrence of an unforeseen event. So once we acknowledge the existence of unforeseen events, we need to develop some way of assigning utilities to decisions conditioned on unforeseen events. The commonsensical solution to this problem is to assign similar utilities to events which are similar. Implementing this commonsensical solution is equivalent to replacing Bayesian subjective probabilities over the space of foreseen and unforeseen events by random set theory probabilities over the space of foreseen events. This leads to an expected utility principle in which normalized variants of Shafer's commonalities play the role of subjective probabilities. Hence allowing for unforeseen events in decision analysis causes Bayesian probability theory to become much more similar to Shaferian theory.
- North America > United States > New York (0.05)
- North America > United States > New Jersey (0.04)
- North America > United States > Michigan (0.04)
- Europe > Netherlands > North Holland (0.04)